Skip to content

Promote bulk gap-bridging + cleanup + workspace_swap hardening to test#37

Merged
michaeldeongreen merged 4 commits into
testfrom
dev
May 9, 2026
Merged

Promote bulk gap-bridging + cleanup + workspace_swap hardening to test#37
michaeldeongreen merged 4 commits into
testfrom
dev

Conversation

@michaeldeongreen

Copy link
Copy Markdown
Owner

Promotes the cluster of work that's accumulated on dev since the last test promotion. DEPLOY_METHOD=bulk is the active deploy path; the test-environment Bulk API workflow will fire on this merge.

Commits being promoted

PR What
#33 Docs + workflow rename + post-Phase-2 hardening + TypedDict pass
#34 workspace_swap.py hardening — mandatory .env, YES confirmation, self-heal recovery
#35 Chat-driven swap confirmation in /swap-to-feature prompt
#36 Doc drift fix in fabric-development-process.md

What deploys

The bulk path (reusable-deploy-bulk.yml + scripts/deploy_bulk.py) ran successfully when Phase 2 was first promoted to test (PR #31 → PR #32). No code changes to the bulk path since then; PR #33's changes were limited to the workflow rename + hardening + TypedDict (annotations) + docs.

Validation plan

  • Watch Deploy to Test (Bulk API) workflow fire on this merge
  • Watch ETL - Test chain afterward (workflow_run trigger on the deploy's success)
  • Spot-check the test workspace if anything looks off

* Rename reusable-deploy-supported.yml to reusable-deploy-fabric-cicd.yml

Symmetric naming with reusable-deploy-bulk.yml. Both reusable workflows
are now named after their underlying deploy mechanism rather than what
they deploy ('supported items' was vague — both paths deploy supported
items).

Renames:
- File: reusable-deploy-supported.yml -> reusable-deploy-fabric-cicd.yml
- Reusable workflow name: 'Reusable: Deploy Supported Items' ->
  'Reusable: Deploy via fabric-cicd'
- Reusable job name: 'Deploy supported items (...)' ->
  'Deploy via fabric-cicd (...)'
- Orchestrator job ID: deploy-supported -> deploy-fabric-cicd
  (in deploy-test.yml and deploy-prod.yml)
- Orchestrator job name: 'Deploy supported items' -> 'Deploy via fabric-cicd'

Other touches:
- reusable-deploy-bulk.yml header comments updated to reference the new
  filename (two stale references)
- scripts/deploy_fabric_cicd.py module docstring updated

Not changed in this commit (covered by the docs pass on this same branch):
- 6 references in fabric-hybrid-cicd-guide.md
- 2 narrative 'deploy supported items' phrasings that describe the
  sandwich-pattern concept, not workflow names — those stay

Caller wiring verified: deploy-test.yml and deploy-prod.yml point at the
new filename. ETL workflow_run triggers reference orchestrator names
(unchanged), so ETL chains stay intact.

Tests: 166 pass.

* Hardening: token mask, Retry-After clamping, nested VL config, log clarity

Four small post-Phase-2 cleanups applied together because they all touch
deploy_bulk.py and overlap.

1. Drop the ::add-mask:: line in acquire_token. The line itself emitted
   the bearer token to stdout before GitHub's mask filter could redact
   it, defeating the purpose. The token is never logged elsewhere, so
   no mask was needed in the first place. Test inverted to assert the
   token does NOT appear in stdout (regression guard).

2. Add POLL_CEILING_SECONDS = 600 and a _parse_retry_after helper that
   handles None, unparseable strings, and clamps to
   [POLL_FLOOR_SECONDS, POLL_CEILING_SECONDS]. Both poll_lro and
   interpret_post_response now use it. Prevents a pathological
   Retry-After value (or unparseable garbage) from either crashing the
   script or sleeping past the global polling timeout.

3. New VariableLibraryConfig nested dataclass replaces the flat
   BulkConfig.variable_library_active_value_set field. The dataclass
   shape now mirrors the YAML structure, and future VariableLibrary
   settings can be added without changing the parent shape.

4. main()'s log output now distinguishes three cases:
   - bulk-parameter.yml missing
   - present but no rules
   - present with N rules
   Cleaner mental model when debugging 'why isn't substitution
   happening?'.

Tests: 173 pass (was 166; +7 for _parse_retry_after happy/clamp/fallback
paths and the new interpret_post_response defensive behavior).

* Add TypedDicts for Fabric API response shapes across scripts/

Pure annotation pass. No runtime behavior change; tests pass unchanged.

Adds TypedDict classes to document the shapes the scripts send and
receive from the Fabric REST API. They give Pylance/mypy the
information needed to catch field-name typos and to autocomplete on
response objects. Not enforced at runtime.

scripts/deploy_bulk.py — 4 new TypedDicts:
- DefinitionPart (total): one definitionParts[] element
- ImportItemDetail (partial): one importItemDefinitionsDetails[] entry
- BulkResponseBody (partial): sync 200 / LRO /result body
- LROStatusBody (partial): /v1/operations/{id} poll body
Function signatures updated: partition_dependencies, extract_item_ids,
apply_substitutions, build_definition_parts, check_per_item_status,
interpret_post_response, post_bulk, plus the body local in poll_lro.

scripts/run_fabric_etl.py — 2 new TypedDicts:
- FabricItem (partial): one List Items value[] element
- JobStatusBody (partial): Jobs API status response
Function signatures updated: find_item_id_by_name,
interpret_poll_response, plus the items local in main().

scripts/workspace_swap.py — 1 new TypedDict:
- ItemTypeRegistryEntry (total): one ITEM_TYPES element
Replaces the union-soup annotation
`list[dict[str, str | list[str] | bool | Callable[[str], bool] | None]]`
with a single named type. Highest readability win in this pass.

scripts/deploy_fabric_cicd.py — no changes (delegates to fabric-cicd
library types; no dict shapes worth typing locally).

`headers: dict` annotations left alone in all files — they're loose
CaseInsensitive mappings from requests that don't benefit from
TypedDict.

Verified:
- 173 tests pass (no regressions)
- Zero Pylance errors across all six edited files

* Docs: bulk implementation guide + bulk narrative cleanup

Adds the Bulk CI/CD Implementation Guide and updates surrounding docs to
honestly describe what the bulk path does, what it bridges, and what it
doesn't. Closes the documentation gap from Phase 2 (the bulk gap-bridging
work) and the workflow rename earlier on this branch.

New file:
- fabric-bulk-cicd-guide.md (~370 lines) — implementation guide for the
  bulk deploy path. Mirrors the structure of fabric-hybrid-cicd-guide.md
  (architecture, repo structure, deployment flow, workflows,
  configuration strategy, prerequisites, initial deploy, gotchas), with
  bulk-specific sections for the two-deploy decision, extension
  patterns, and limitations not bridged. Self-contained — duplicates
  workspace setup / SPN / GitHub environments rather than linking, so it
  reads end-to-end without bouncing between docs.

Updated docs:
- README.md — new row in the Documentation table for the bulk guide.
- fabric-cicd-release-options.md — bulk row in the comparison table now
  says 'None at the API level' with a note pointer; new note block
  explains substitution and value-set activation are caller-implemented
  workarounds, not API capabilities, and links to the new bulk guide;
  recommendation paragraph reworded to reflect that bulk is still in
  Preview and that bridging is caller responsibility.
- fabric-hybrid-cicd-guide.md — workflow rename (6 occurrences of the
  old reusable-deploy-supported.yml filename), repo structure tree
  updated with all currently-existing workflows / scripts / data items,
  new note pointing to the bulk path with link to the bulk guide.
- .github/workflows/reusable-deploy-bulk.yml — Known Gaps comment block
  expanded honestly: distinguishes API gaps from caller-bridged gaps,
  notes orphan cleanup is not implemented because the API doesn't
  support delete.
- scripts/deploy_bulk.py — module docstring expanded with the same
  framing (API gaps vs caller bridging) and the Known Gaps list now
  distinguishes 'no full parameter.yml feature coverage' (specific) from
  'no orphan cleanup' (broader).

Verified:
- 173 tests still pass
- Zero stale references to reusable-deploy-supported.yml or the old
  deploy-supported job ID anywhere in the repo
- All cross-links between docs resolve to real anchors
* Swap to feature workspace + UTF-8 stdout fix for workspace_swap

scripts/workspace_swap.py:
- Force UTF-8 reconfigure on sys.stdout/sys.stderr at module load.
  Without this, the Unicode box-drawing characters used in section
  headers and the summary banner crash on Windows consoles that default
  to cp1252. Surfaced when running --dry-run end-to-end for the first
  time.

data/fabric (workspace_swap output for branch test-workspace-swap-changes):
- New per-branch value set: Patterns_Variables.VariableLibrary/valueSets/
  test-workspace-swap-changes.json
- settings.json valueSetsOrder updated
- Patterns_Semantic_Model.SemanticModel: Direct Lake URL repointed
- Import_Patterns_Data.Notebook: META block lakehouse dependency
  repointed

* Update

* Harden workspace_swap: mandatory .env, YES confirmation, self-heal recovery

Surfaced by a real-world incident where stale .env values produced a
swap pointed at the wrong workspace and the script's one-way find/replace
couldn't recover from the resulting bad state.

scripts/workspace_swap.py changes:

1. resolve_feature_ids() now requires .env. Removes the value-set-priority
   path and the interactive-prompt fallback. Both were silently overriding
   what the developer typed in .env. Missing .env, missing keys, or blank
   values \u2192 sys.exit() with a clear error pointing at .env.sample.

2. New _confirm_swap_to_feature() helper. Prints a 'Planned swap' summary
   showing dev \u2192 feature for both workspace and lakehouse, then requires
   the user to type literal 'YES' (case-sensitive, exact match). Anything
   else \u2192 abort cleanly. Skipped on --dry-run since the dry-run is itself
   the verification step.

3. New _read_previous_feature_ids() helper + recovery pass in
   _run_swap_to_feature(). Reads the previously-applied feature IDs from
   the per-branch value-set file BEFORE the value set is rewritten in
   step 3, then runs a second find/replace pass in step 4 to rewrite any
   stale feature IDs that remain in repointable files. Skipped when the
   previous IDs match the target (no recovery needed) or when there's no
   previous record (first swap).

4. --check-ready and --swap-to-dev are unaffected. swap-to-dev still
   reads from the value-set file as the source of feature IDs.

tests/test_workspace_swap.py changes:

- Removed the value-set-priority and prompt-fallback tests
- Added 7 strict 'exits cleanly' tests for resolve_feature_ids
- New TestConfirmSwapToFeature class: 7 tests covering YES /
  lowercase / blank / EOF / whitespace / planned-summary-display
- New TestReadPreviousFeatureIds class: 4 tests covering the helper
- Wrapped existing _run_swap_to_feature test calls in YES input mock
- Added test_dry_run_skips_confirmation
- Added test_recovers_from_stale_feature_ids_in_files (regression
  guard for today's incident)
- Added test_no_recovery_pass_when_stale_matches_target

Verified: 185 tests pass (was 173).

* Fix corrupted feature IDs in SemanticModel + Notebook (manual recovery)

The first run of workspace_swap on this branch (commit 9aebf8f) was
driven by stale .env values, applying the wrong workspace ID
(613e22cd-308a-4106-9745-58bd5164568a) and the wrong lakehouse ID
(1b80046f-d76e-4748-acfc-575f3fbd23f4) into:
  - Patterns_Semantic_Model.SemanticModel/definition/expressions.tmdl
  - Import_Patterns_Data.Notebook/notebook-content.py

The Fabric UI subsequently corrected the value-set file (commit 7f7ac85),
which severed the breadcrumb the new self-heal recovery pass uses to
discover stale IDs. So the recovery pass can't help with this specific
state \u2014 the fix is manual.

Replaces:
  613e22cd-308a-4106-9745-58bd5164568a \u2192 1484d4b0-4c88-4347-9cce-2d2bd03848b0
  1b80046f-d76e-4748-acfc-575f3fbd23f4 \u2192 914bb153-d687-4ea9-9870-d61434b2884f

Also normalizes the trailing newline in the value-set file (added by
the script's JSON serializer; cosmetic).

After this commit, all .env GUIDs match what's in the tracked Fabric
files. Going forward, the script's confirmation prompt + self-heal
recovery will catch this class of incident automatically.

* Ghost

* Swap back to dev for PR readiness

Reverts the test-workspace-swap-changes branch to dev IDs ahead of opening
the PR to dev. The check-pr-ready CI workflow enforces this — feature IDs
in tracked Fabric files would block the merge.

Reverted by scripts/workspace_swap.py --swap-to-dev:
- Patterns_Semantic_Model.SemanticModel/definition/expressions.tmdl
  (Direct Lake URL: feature → dev)
- Import_Patterns_Data.Notebook/notebook-content.py
  (META block: feature → dev)
- Patterns_Variables.VariableLibrary/valueSets/test-workspace-swap-changes.json
  (deleted — per-branch value set no longer needed)
- Patterns_Variables.VariableLibrary/settings.json
  (test-workspace-swap-changes removed from valueSetsOrder)
* Swap to feature workspace + chat-driven confirmation in prompt

Test of the updated swap-to-feature.prompt.md flow that moves the YES
confirmation into the chat UI instead of a live terminal prompt.

scripts/workspace_swap.py output for branch swap-file-smoke-test:
- New per-branch value set: Patterns_Variables.VariableLibrary/valueSets/
  swap-file-smoke-test.json
- settings.json valueSetsOrder updated
- Patterns_Semantic_Model.SemanticModel: Direct Lake URL repointed to
  feature workspace + lakehouse
- Import_Patterns_Data.Notebook: META block lakehouse dependency
  repointed to feature workspace + lakehouse

.github/prompts/swap-to-feature.prompt.md updated to instruct the agent
to ask for confirmation in the chat UI (vscode_askQuestions) and then
pipe YES into the script (echo "YES" | python ...) so the terminal
never blocks on the script's interactive Confirm: prompt.

* Swap back to dev for PR readiness

Reverts the swap-file-smoke-test branch to dev IDs ahead of opening
the PR to dev. The check-pr-ready CI workflow enforces this — feature
IDs in tracked Fabric files would block the merge.

Reverted by scripts/workspace_swap.py --swap-to-dev:
- Patterns_Semantic_Model.SemanticModel/definition/expressions.tmdl
  (Direct Lake URL: feature → dev)
- Import_Patterns_Data.Notebook/notebook-content.py
  (META block: feature → dev)
- Patterns_Variables.VariableLibrary/valueSets/swap-file-smoke-test.json
  (deleted — per-branch value set no longer needed)
- Patterns_Variables.VariableLibrary/settings.json
  (swap-file-smoke-test removed from valueSetsOrder)
… pass (#36)

PR #34 made .env mandatory for swap-to-feature, removed the interactive
prompt fallback, removed the value-set-priority shortcut, and added a
case-sensitive YES confirmation. PR #35 moved that confirmation into the
chat UI for slash-command invocations. fabric-development-process.md still
described the old behavior in three spots.

Updates:
- Step 4 (swap-to-feature): drop the 'or prompts if .env is missing'
  claim; add the YES confirmation step; add the recovery pass step.
- Local .env Setup section: replace 'falls back to interactive prompt'
  + 'value set is source of truth' with the new model (.env always
  authoritative for swap-to-feature; value set used by swap-to-dev
  and the recovery pass; YES confirmation gate).
- Copilot Chat section: note that /swap-to-feature moves the YES
  confirmation into the chat UI.

Tests: 185 pass (no script changes).
@michaeldeongreen michaeldeongreen merged commit 2137364 into test May 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant